Assessment of P-value variability in the current replicability crisis

نویسندگان

  • Olga A. Vsevolozhskaya
  • Gabriel Ruiz
  • Dmitri V. Zaykin
چکیده

Increased availability of data and accessibility of computational tools in recent years have created unprecedented opportunities for scientific research driven by statistical analysis. Inherent limitations of statistics impose constrains on reliability of conclusions drawn from data but misuse of statistical methods is a growing concern. Significance, hypothesis testing and the accompanying P-values are being scrutinized as representing most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of scientific hypothesis’s credibility. It has also been suggested that while P-values may have their role as summary measures of effect, researchers underappreciate the degree of randomness in the P-value. High variability of P-values would suggest that having obtained a small P-value in one study, one is, nevertheless, likely to obtain a much larger P-value in a similarly powered replication study. Thus, “replicability of P-value” is itself questionable. To characterize P-value variability one can use prediction intervals whose endpoints reflect the likely spread of P-values that could have been obtained by a replication study. Unfortunately, the intervals currently in use, the P-intervals, are based on unrealistic implicit assumptions. Namely, P-intervals are constructed with the assumptions that imply substantial chances of encountering large values of effect size in an observational study, which leads to bias. The long-run coverage property provided by P-intervals is similar in interpretation to the coverage provided by the classical confidence intervals, but the endpoints of any particular interval lack interpretation as probabilistic bounds for possible spread of future P-values that may have been obtained in replication studies. As an alternative to P-intervals, we develop a method that gives researchers flexibility by providing them with the means to control these assumptions. Unlike endpoints of P-intervals, endpoints of our intervals are directly interpreted as probabilistic bounds for replication P-values and are resistant to selection bias contingent upon approximate prior knowledge of the effect size distribution. We showcase our approach by its application to P-values reported for five psychiatric disorders by the Psychiatric Genomics Consortium group.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is the call to abandon p-values the red herring of the replicability crisis?

In a recent article, Cumming (2014) called for two major changes to how psychologists conduct research. The first suggested change—encouraging transparency and replication—is clearly worthwhile, but we question the wisdom of the second suggested change: abandoning p-values in favor of reporting confidence intervals (CIs) only in all psychological research reports. This article has three goals. ...

متن کامل

The validity, diagnostic value and replicability of Bender Visual-Motor Gestalt Test in traumatic brain injury patients

Introduction: Bender Gestalt test is one of the most famous neuropsychological tests that is simple and it can be used to examine brain injuries. The objective of this research was to investigate the validity, diagnostic strength and the replicability of the Bender Visual-Motor Gestalt Test in patients with traumatic brain injury (TBI). Methods: 240 participants were tested in a case-control st...

متن کامل

How to Minimize the Impact of Pandemic Events: Lessons From the COVID-19 Crisis

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the current pandemic of coronavirus disease 2019 (COVID-19). This pandemic is characterized by a high variability in death rate (defined as the ratio between the number of deaths and the total number of infected people) across world countries. Several possible explanations have been proposed, but it is not clear whe...

متن کامل

From Discovery to Justification: Outline of an Ideal Research Program in Empirical Psychology

The gold standard for an empirical science is the replicability of its research results. But the estimated average replicability rate of key-effects that top-tier psychology journals report falls between 36 and 39% (objective vs. subjective rate; Open Science Collaboration, 2015). So the standard mode of applying null-hypothesis significance testing (NHST) fails to adequately separate stable fr...

متن کامل

Baltic hard bottom mesocosms unplugged : replicability , repeatability and ecological realism examined by non - parametric multivariate techniques

The general utility of large-scale artificial ecosystems for ecological and ecotoxicological research is evaluated by a case study of the replicability, repeatability and ecological realism of a Baltic Sea hard bottom littoral mesocosm (called BHB-mesocosm). The structure (species abundance and biomass listings) of the macrofauna community associated with bladder-wrack, Fucus vesiculosus L., in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016